Spherical Paragraph Model
نویسندگان
چکیده
Representing texts as fixed-length vectors is central to many language processing tasks. Most traditional methods build text representations based on the simple Bag-of-Words (BoW) representation, which loses the rich semantic relations between words. Recent advances in natural language processing have shown that semantically meaningful representations of words can be efficiently acquired by distributed models, making it possible to build text representations based on a better foundation called the Bag-of-WordEmbedding (BoWE) representation. However, existing text representation methods using BoWE often lack sound probabilistic foundations or cannot well capture the semantic relatedness encoded in word vectors. To address these problems, we introduce the Spherical Paragraph Model (SPM), a probabilistic generative model based on BoWE, for text representation. SPM has good probabilistic interpretability and can fully leverage the rich semantics of words, the word co-occurrence information as well as the corpus-wide information to help the representation learning of texts. Experimental results on topical classification and sentiment analysis demonstrate that SPM can achieve new state-of-the-art performances on several benchmark datasets.
منابع مشابه
The Impact of Teaching Developmental Grammatical Errors on Iranian Undergraduate Translator Trainees’ L2 Paragraph Writing Ability
This study aimed to investigate the impact of teaching developmental grammatical errors on translator trainees’ L2 paragraph writing ability. After administration of Oxford Placement Test (OPT), 40 out of 100 undergraduate translator trainees were selected from the Islamic Azad University of Tonekabon branch. Further, the participants were divided into an experimental and a control groups...
متن کاملThe Comparative Effect of Using Idioms in Conversation and Paragraph Writing on EFL Learners’ Idiom Learning
This study investigated the comparative effect of teaching idiomatic expressions through practicing them in conversation and paragraph writing on intermediate EFL learners’ idiom learning. The participants were sorted out of a population of 134 intermediate students in Zabansara Language School in Khorramabad based on their scores on a Preliminary English Test (PET) and an idiom test piloted in...
متن کاملParagraph vector based topic model for language model adaptation
Topic model is an important approach for language model (LM) adaptation and has attracted research interest for a long time. Latent Dirichlet Allocation (LDA), which assumes generative Dirichlet distribution with bag-of-word features for hidden topics, has been widely used as the state-of-the-art topic model. Inspired by recent development of a new paradigm of distributed paragraph representati...
متن کاملLearning to Distill: The Essence Vector Modeling Framework
In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this school of research. However, paragraph (or sentence and document) embedding learning is more suitable/reasonable for some tasks, such as sentiment classific...
متن کاملGraph-based normalization
Abstract. In this paper we construct a graph-based normalisation algorithm for non-linear data analysis. The principle of this algorithm is get, in average, spherical neighborhood with unit ray. In a first paragraph we show why this algorithm can be useful as a preliminary for some neural algorithms as those that need to compute geodesic distance. Then we present the algorithm, its stochastic v...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018